skip to main content


Search for: All records

Creators/Authors contains: "Thomas, Paul D"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

     
    more » « less
  2. null (Ed.)
    Abstract PANTHER (Protein Analysis Through Evolutionary Relationships, http://www.pantherdb.org) is a resource for the evolutionary and functional classification of protein-coding genes from all domains of life. The evolutionary classification is based on a library of over 15,000 phylogenetic trees, and the functional classifications include Gene Ontology terms and pathways. Here, we analyze the current coverage of genes from genomes in different taxonomic groups, so that users can better understand what to expect when analyzing a gene list using PANTHER tools. We also describe extensive improvements to PANTHER made in the past two years. The PANTHER Protein Class ontology has been completely refactored, and 6101 PANTHER families have been manually assigned to a Protein Class, providing a high level classification of protein families and their genes. Users can access the TreeGrafter tool to add their own protein sequences to the reference phylogenetic trees in PANTHER, to infer evolutionary context as well as fine-grained annotations. We have added human enhancer-gene links that associate non-coding regions with the annotated human genes in PANTHER. We have also expanded the available services for programmatic access to PANTHER tools and data via application programming interfaces (APIs). Other improvements include additional plant genomes and an updated PANTHER GO-slim. 
    more » « less
  3. Abstract

    The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.

     
    more » « less
  4. Abstract Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology—evolutionary relatedness—is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit—from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses. 
    more » « less
  5. Abstract

    Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.

     
    more » « less
  6. Abstract The identification of orthologs—genes in different species which descended from the same gene in their last common ancestor—is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases. 
    more » « less
  7. Abstract

    The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO—a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations—evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)—mechanistic models of molecular “pathways” (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.

     
    more » « less
  8. Abstract

    We aim to enable the accurate and efficient transfer of knowledge about gene function gained fromArabidopsis thalianaand other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non‐plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data.

     
    more » « less
  9. Wood, V (Ed.)
    Abstract The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein–protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides. 
    more » « less